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ANIMISTIC THINKING AMONG COLLEGE AND 
HIGH SCHOOL STUDENTS IN THE NEAR EAST 


WAYNE DENNIS 
Brooklyn College, Brooklyn, N. Y. 


INTRODUCTION 


In an earlier publication (4) the writer reported that approxi- 
mately one-third of the students in several American colleges and 
universities attribute life to one or more inanimate objects. This 
finding has been corroborated by several other investigators (1, 2, 
6, 9). 

In comparison to many countries, the level of education in gen- 
eral, and the level of scientific knowledge in particular, are high in 
America. If animistic thinking is present in one-third of American 
college students, it seems likely that it is even more prevalent in 
countries in which the educational levels are lower, and in which 
natural science concepts and information are less widely dissemi- 
nated than in the United States. The present paper reports data 
relative to this deduction, derived from college and high school 
students in the Near East. : 


PROCEDURE 


Each subject was given a mimeographed sheet containing the 
following instructions: 

“In connection with each of the objects listed below, indicate 
whether it is living or not living. Give a reason for each answer. 
Please answer the questions in the way that you think they would 
be answered by a biologist. For each object, the question is, is the 
object living in the same way that animals and plants are living?” 

Below the instructions appeared the names of seven objects. For 
the subjects who were college students the objects were: a lighted 
match, the sun, the wind, the sea, lightning, a pearl and atoms. On 
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the blanks given to high school students, “a river” was used in- 
stead of “the sea” and “petroleum” instead of “atoms.” In the 
case of college students, the questions were presented and answered 
in English, which was their language of instruction. Since the teach- 
ing of the high school subjects was in Arabic, in their case the 
questions were presented and answered in that language. 


SUBJECTS 


The college students were enrolled in a woman’s college and a 
university in Beirut, Lebanon. Although these institutions are lo- 
cated in Lebanon, their students come from many Near Eastern 
countries. In the university, only male students were used as sub- 
jects. The majority were students in the college of arts and sciences. 
High school data were obtained from three public high schools in 
Iraq. Two of these schools were located in Mosul and one in Bagh- 
dad. The latter school and one of the Mosul schools were day 
schools; the other was a night school attended primarily by adults 
who had discontinued their earlier education before the comple- 
tion of high school. 

In the colleges the animism questionnaire was given in the intro- 
ductory psychology course. Since the majority of students at these 
institutions take this course, the sampling is believed to be repre- 
sentative of students at these institutions. In the high schools, the 
questionnaires were administered to first-, second- and third-year 
students in required courses, and consequently, the results are be- 
lieved to be representative of these levels in these schools. 


RESULTS 


The number of subjects tested in each school and the per cent 
giving one or more animistic answers are indicated in Table I. 

The most arresting fact about Table I is the generally high fre- 
quencies of animistic thinking found in all groups. The lowest 
figure (fifty-three per cent) was obtained for men in the univer- 
sity. All other percentages lie between seventy-four per cent and 
ninety-five per cent. 

The difference between the college men and college women is 
probably due to differences in the institutions from which they 
come in regard to student selection and required science courses. 
Sex comparisons in the high schools, in which admission policies 
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TaBLE I.—NUMBER AND PER CENT oF STUDENTS 
GivinG ANIMISTIC ANSWERS 








Mosul 
Gellege | Coleg | Baghdad | Moss! Dey] Nigut | Tota 
‘ 
Number of 130 75 115 286 141 747 
students 
Per cent giving 77 53 95 81 74 79 
one or more 
animistic 
answers 























and science courses for the two sexes are identical, indicate no sex 
differences. Comparisons of Christian and Moslem students, not 
included in Table I, indicate these two groups do not differ in fre- 
quency of animistic answers. 

Table II indicates the frequency with which each object was said 
to be living by the members of each group. It will be noted that the 
lowest percentage obtained was for lightning. Among the univer- 
sity men only sixteen per cent stated that it was living. The uni- 
versity men yield a response frequency of twenty-one per cent for 
both the burning match and the wind. The highest per cents ob- 
tained were seventy-five per cent and seventy-eight per cent. In 
both eases these refer to the pearl, and the scores were made by the 


Tas_e II.—PerR Cent oF RESPONSES TO EacH QUESTION 











WHICH WERE ANIMISTIC ‘ 
College College Baghdad Mosul Day |Mosul Night 
Women Men Students Students Students 
Match 47 — 50 27 42 
Sun =, 38 26 66 36 35 
Wind | 34 21 56 24 35 
Sea 41 24 
Lightning 30 16 32 23 25 
Pearl 26 24 78 75 42 
Atoms 55 31 
River 54 28 42 
Petroleum 49 23 40 
Per cent of total 39 23 55 34 37 
answers animistic 4 
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Tas_Le III.—Reasons Given ror ANIMISTIC ANSWERS | 














College Women Baghdad Students 
Per cent Per cent 
Moves, changes, does something 39 38 
Supports or produces life 21 7 
Has a beginning and an end 14 4 
Grows 16 23 
Uses oxygen, produces energy 6 16 
Miscellaneous 4 12 





Iraqi high school groups. It will be noted that in four of the five 
groups one-third or more of the total responses were animistic. 

In the case of two groups, the college women and the Baghdad 
students, the reasons given for positive answers have been ex- 
amined and classified. The results are shown in Table III. It will be 
noted that in the two groups, respectively, thirty-nine per cent and 
thirty-eight per cent of the answers are what Piaget would call 
Stage II or Stage III answers: the object is living because it moves, 
either by itself or from some other cause. These Piaget classifies 
as non-adult answers. While these answers occur frequently in our 
group, the simplest sort of answer obtained from children, namely 
that an object is living because it is useful, was rarely encountered 
among college and high school students. Informal studies of Leban- 
ese elementary school pupils which will not be reported in detail 
show that Stage I answers (answers in term of use) predominate 
at the earlier age levels in Lebanon, as they do elsewhere (3, 8). 

The statement that an object is living because it grows should 
perhaps not be listed as an instance of wrong reasoning. The rea- 
soning is not incorrect; only the facts are wrong. Among the Iraqi 
subjects this answer was often given in respect to the pearl. 

The statements that an object, such as the burning match, is 
living because it is using oxygen illustrate the fact that science 
education does not always operate to reduce animistic answers. 
Apparently the students had learned that living things use oxygen, 
and they reasoned that whatever uses oxygen, must be living. 

College and high school students give two types of answers 
which are seldom given by children. The answers occur among 
American college students (4) as well as in the present sample. One 
is that the sun, or the sea, or some other object is living because it 
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produces, or is necessary for, life. This answer seems to result from 
the premise that like produces like, and therefore, that whatever is 
responsible for, or necessary for life, must itself be living. The sec- 
ond notion is that since life has a beginning and an end, all things 
which have a beginning and an end, as do a flame and a flash of 
lightning, are therefore living. These answers have not been re- 
ported by Piaget in his four stages of animism, presumably be- 
cause he questioned only children. They represent categories or 
“stages” in excess of his four. 


DISCT’SSION 


It is obvious from the present study, as well as from earlier ones 
(3,4), that Piaget’s assumption that animism is limited to children 
is incorrect. It seems probable that a degree of animistic thinking 
can be found in some adults in any society. The actual percentage 
of animistic replies will differ from question to question and from 
group to group. 

While the cultural area in which the present study was conducted 
was a center of one of the earliest civilizations, extensive modern 
educational programs have been introduced only recently. Science 
education, as compared to linguistic, literary and historical in- 
struction, is undoubtedly still weak. These two facts probably ac- 
count in the main for the high incidence of animistic replies to our 
questions. 

We do not believe that the prevalence of animistic thinking in 
our subjects is due to positive cultural influences. Christianity and 
Islam, to which our subjects belong, both reject the view that ob- 
jects of nature and forces of nature are themselves alive. Both of 
these religions would reject such ideas as tending toward idolatry. 

We have indicated earlier (3) our belief that animism arises 
autogenously in individuals who lack an adequate background in 
modern biological concepts and knowledge. The persistence of ani- 
mism into the adult years, in the Near East, is to be accounted for 
in terms of educational handicaps rather than in terms of a posi- 
tive animistic tradition. 


SUMMARY 


Seven hundred and forty-seven Near Eastern college and high 
school students were questioned in regard to whether each of sev- 
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eral objects is living or not living. Seventy-nine per cent gave one 
or more animistic answers. In the various groups the frequencies 
of animistic answers were much higher than those obtained among 
American students of comparable educational placement. The dif- 
ferences are presumed to reflect the differences in diffusion of scien- 
tific concepts and information in these populations. 
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EFFECTS OF PRAISE AND REPROOF ON 
READING GROWTH IN A NON- 
LABORATORY CLASSROOM 

SETTING 


HARRY F. SILBERMAN 
Division of Teacher Education 


Municipal Colleges of the City of New York 


This report compares teachers whose verbal behavior is charac- 
terized by different amounts and combinations of praise and re- 
proof. Praise and reproof are verbal incentives that have been 
found to enhance performance on learning tasks (1, 3, 8, 9). In 
some experiments praise has been superior to reproof in its effect 
upon performance (1). In other experiments, reproof has been 
shown to be more effective (2). Three sources of variation which 
might account for the conflicting results obtained in these studies 
are: (1) school or teacher differences, (2) motivational differences 
among pupils engendered by novel laboratory settings, and (3) 
differences in operational definitions of praise and reproof. 

The present study, using forty-nine teachers in grades three to 
six in nineteen schools, is designed to study the effects of praise 
and reproof in teachers’ verbal behavior on reading growth of 
pupils in non-laboratory classroom settings. It was the intention 
in this study to investigate not only praise and reproof but the 
combination of the two. The combination of praise and reproof 
is obtained by multiplying the praise scores by the reproof scores. 
A convenient way of interpreting this product variable is provided 
by the following table: 





PRAISE 
hi lo 
hi a b 

REPROOF ee. 
lo c d 
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This table shows four possible combinations of praise and re- 
proof in teachers’ verbal behavior. If the PR variable, (praise-re- 
proof combination), is significantly related to the reading growth 
criterion, teachers in the “a” quadrant differ from teachers in the 
“qd” quadrant in promoting reading growth provided other rele- 
vant factors have been controlled. 

Two additional variables, teachers’ verbal output as measured 
by the number of statements teachers make and the relative 
amount of time spent on reading skills, were also chosen for study. 

The hypothesis of this study, stated in the null form, is that 
measured reading growth is unrelated to a weighted combination 
of the relative amounts of praise, reproof, combined praise by re- 
proof, verbal output, and the observed amount of time spent on 
reading skills. If this hypothesis is rejected, the alternatives are 
that either praise, reproof, praise by reproof, verbal output, time 
on reading skills, or some combination of these five independent 
variables, is related to measured reading growth of pupils. 


PROCEDURE 


Sample. This study was conducted during the school year 1954- 
55 using twenty-three third-grade classes, thirteen fourth-grade 
classes, nine fifth-grade classes and four sixth-grade classes located 
within nineteen New York City public elementary schools. Test 
data were collected from a total of nine hundred and four pupils. 
All forty-nine teachers were in their first year of teaching. They 
constitute a sub-sample drawn from sixteen hundred and twenty- 
eight members of the student teaching classes of 1953-54 at the 
municipal colleges of New York City: City, Hunter, Brooklyn and 
Queens. Descriptive information about the sixteen hundred and 
twenty-eight student teachers from whom the forty-nine teachers 
in this study were drawn is given in a previous publication (6). 

Classroom observations. Six classroom observers were employed, 
each observer paying each of the forty-nine teachers two half-hour 
visits during the school year. Each teacher was visited by only one 
observer at a time. An observational schedule and rating form, de- 
tails of which are given in a previous report (5) was developed 
for use by the classroom observers. The dimensions in the observa- 
tion schedule with which this study is concerned are called “ex- 
pressive behaviors” and “subject-matter emphasis.” 

The “expressive behaviors” dimension, adapted from Withall 
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(11), consists of a group of seven spaces on an observation card 
for tallying teachers’ verbal behavior and certain non-verbal be- 
haviors. The half-hour visits were divided into six five-minute ob- 
servation periods. During the second, fourth, and sixth five-minute 
observation period the observers tallied every remark the teacher 
made in one of the following categories of the “expressive behav- 
iors” dimension. 


(1) Supportive statements. Teachers’ remarks which trained observers 
construe as approval of the pupils or of their responses are tallied in this 
portion of the card. For example, comments such as “Right,” “Good,” “Fine 
work,” which inform the pupil that he or his responses adequately satisfy 
the standards which the teacher expects the pupil to meet, are classified as 
praise. 

(2) Problem-structuring statements. Statements in which the pupil is 
offered facts, ideas, opinions, or questions in an objective non-threatening 
manner. 

(3) Neutral statements. Statements not classified in any other category. 

(4) Directive statements. Statements that limit the pupils’ choice of be- 
haviors. Examples are: “Answer the questions on page 12” and “Pass the 
papers forward.” 

(5) Reproving statements. Tallies of hostile, disapproving verbal be- 
havior which the observers construe as threatening to the pupil. “You’re all 
wrong” and “Won’t you ever learn?” are illustrations of reproving statements. 


During the first series of classroom visits the six observers made 
a new tally on this dimension only when the teacher’s remarks 
shifted from one category to another. During the second series of 
classroom visits the six observers tallied the frequency of.occur- 
rence of statements in each category. 

The “subject-matter” dimension of the observation schedule is 
a group of ten categories for tallying the subject areas in which 
the activity of the class is centered. One tally was made in this 
section by each observer during every five-minute portion of the 
half-hour observation period. The subject areas are: reading, 
mathematics, language arts, social studies, science, recreation, arts 
and crafts, music, social processes and test periods. 

The first series of observations was arranged during February 
and March of 1955. Teachers were assured on several occasions 
that the observer was simply recording what went on in the class- 
rooms and was in no way judging or rating the teachers. They 
were told of the purely scientific interest of the investigators and 
were repeatedly assured that all information obtained was entirely 
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confidential. The teachers were notified in advance of the date and 
approximate time of each visit, and sometime near the appointed 
time an observer unobtrusively entered the classroom, taking a 
seat at the rear of the room. The observer proceeded to observe 
and place tallies on the observation schedule card reserved for that 
teacher, employing a stop-watch to time the five-minute portions 
of the half-hour visit. The second series of visits was arranged 
during March and April, 1955. 

The criterion (dependent variable). A detailed discussion of the 
development of the reading growth criterion is presented in another 
report (7) in the current series of research studies on teacher be- 
havior being performed by the Division of Teacher Education of 
the College of the City of New York (10). Reading ability of the 
pupils in all forty-nine classes was measured during November 
1954 and again during April or May of 1955 using the California 
Reading Test (Elementary): Word Form, Word Recognition, 
Meaning of Similarities, and Interpretation of Meanings. Form § 
of the California Test of Mental Maturity, Non-Language Section, 
was also administered during November, 1954. The reading growth 
indices to be used as a criterion in this study consist of the means of 
the final reading scores of the pupils in the teachers’ classes, ad- 
justed by covariance control for the mental maturity scores and the 
initial reading scores. This procedure effectively equates pupils 
with respect to mental maturity and previous achievement in read- 
ing. In the analysis of the data these reading indices were converted 
to normalized ““T”’ scores. 

Composition of the independent variables. This study entailed 
the use of six variables, consisting of one dependent variable 
described above and five independent variables. 

The five independent variables are operationally defined as 
follows: 


(1) Praise. Total number of supportive statements tallied. 

(2) Reproof. Adjusted total number of reproof statements tallied. 

(3) Praise by Reproof. A composite of praise and reproof consisting of the 
product of the number of supportive and adjusted number of reproof state- 
ments tallied. The highest “PR” score would be obtained by a teacher whose 
verbal behavior is characterized by the largest amounts of praise and re- 
proof in combination. 

(4) Verbal Output. Total number of verbal statements tallied in the “ex- 
pressive behaviors” dimension of the observation schedule for each teacher. 
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This is an estimate of the relative amount of classroom verbal behavior ex- 
hibited by each teacher. 

(5) Reading Time. Total number of tallies for each teacher in the reading 
section of the “subject-matter” dimension of the observation card. This con- 
stitutes an estimate of the relative emphasis or amount of time that each 
teacher devoted to specific instruction in reading. 


Preliminary analysis of the five independent variables indicated 
that the 49 teachers did not rank differently on the two series of 
observations on all but the reproof scales. As a result, the number 
of tallies on the two series were added for each teacher on all but 
the reproof scales. The reproof measures obtained on each of the 
two series of observations were given equal weight by converting 
the measures on each series to standard scores and then multiply- 
ing the derived scores by the standard deviation of the series with 
the largest variance. The obtained scores were then combined to 
form the reproof score for each teacher. 

In the analysis of the data all variates were converted to normal- 
ized “T” scores. Differences among teachers were significant at 
the 0.01 level on each of the six variables. The reliabilities of these 
variables when estimated by analysis of variance (2, 7), are shown 
in Table I. 


RESULTS AND DISCUSSION 


The intercorrelations among the independent variables in this 
study are shown in Table II. Both praise and reproof are sig- 
nificantly related to total verbal output. 

The relationships between the reading growth scores and the 
independent variables is shown in Table III. The multiple correla- 
tion is not significant at the 0.05 level; hence, this sample does not 
contradict the hypothesis that in the population there is no correla- 
tion between reading growth and a weighted combination of the 
five independent variables. In other words, the independent vari- 
ables do not predict the reading growth of pupils in this sample. 


TABLE I.—EstTIMATED RELIABILITIES OF EXPERIMENTAL VARIABLES 





Reading growth 0.87 
Praise 0.71 
Reproof 0.84 
Verbal Output 0.80 
Reading Time 0.64 
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TaBLeE II.—INTERCORRELATIONS OF FivE INDEPENDENT VARIABLES 








pais | Reorot | ily | Yate | Mates 
Praise —.1605 | +.6370** | +.6057** | +.1839 
Reproof + .6274** | +.4304** | — .0036 
Praise by reproof +.7973** | +.2050 
Verbal output + .2745 
Reading time 




















** Significant at the 0.01 level. 


TaBLE III.—RELATIONSHIPS BETWEEN READING GROWTH SCORES AND 
Five INDEPENDENT VARIABLES 

















Partial Regression 
Independent Variables en ot (Bets Weights) Standard Errors 
Praise + .0258 + .2806 . 5657 
Reproof — .1539 +.1125 . 5597 
Praise by reproof — .1282 — .4522 . 7075 
Verbal Output — .0519 + .1432 . 2574 
Reading Time —.1951 — .1929 . 1616 





Multiple Correlation Ry.12345 = .2795. 


In view of the non-significant relationships between the criterion 
and the independent variables employed in this study, no definite 
conclusions can be drawn. Whether or not an increase in the num- 
ber of teachers observed, or other precautions designed to enhance 
the precision of the experiment, would have produced different 
findings is a matter of conjecture. Perhaps the observed teacher 
behaviors were atypical due to the presence of the observers in 
the classroom. If such were the case, the biased observations would 
have produced independent variables that are not representative 
of the teacher, and the negative results would hardly be surprising. 
A methodological limitation which should be noted in this study 
was the inability to assign pupils to the forty-nine classrooms at 
random. Differences among teachers on the independent variables 
might consequently be attributed to differences in the pupils as- 
signed to those teachers. A teacher who issues large amounts of re- 
proof, for instance, might argue that she is the victim of an emo- 
tionally disturbed class and, given a better class, would certainly 
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produce a more permissive climate and more growth in reading. 
These are problems for further research. 

The results of this study indicate that if there is a relationship 
between the verbal incentives and learning to read, it is a low re- 
lationship. If we accept this interpretation, certain implications 
for further exploration are evident. Many educators extoll praise 
and mild reproof as classroom incentives and offer a carte blanche 
endorsement of both as if praise and reproof contained some in- 
herent merit. Yet, praise and reproof, as defined in this investiga- 
tion, did not have appreciable effects on reading growth. Perhaps 
classroom situational variables not observed in this study are 
much more important to pupil growth than praise and reproof 
per se. The nature of the situations in which the incentives were 
employed and the nature of the pupils receiving the incentives 
may have been much more important variables to observe. At any 
rate, in this study praising and reproving remarks were tallied 
whether or not the incentives were contingent upon the correctness 
of pupils reading responses. The teachers verbal behavior during 
the teaching of the criterion skill as well as during the teaching 
of other skills was used. 

Granting these limitations, the present study does not justify a 
blank endorsement for the use of praise or reproof per se in the 
classroom, and points up the need for the verification of labora- 
tory experiments by observational studies of normal classroom 
behavior. 


SUMMARY 


Forty-nine beginning teachers were each visited twelve times 
during the 1954-55 school year and their verbal behavior was 
categorized and tallied. Pupils were given reading tests at the be- 
ginning and end of the school year. 

A multiple correlation was computed between reading growth 
and five independent variables, namely, praise, reproof, praise by 
reproof, verbal output, and time devoted to reading skills. Results 
of the analysis did not yield significant relationships between the 
independent variables and reading growth of pupils. 

A discussion of the limitations of the study and of possibilities 
for further research on this problem was presented. 
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EFFECT OF STEREOTYPED ATTITUDES ON 
LEARNING 


ROBERT E. EGNER and ALVAN J. OBELSKY 


One of the most recent and promising of the new developments 
in educational testing has been the work done in measuring stereo- 
typed conditioning in students. Students arriving in college are 
hardly the “tabula rasa,” attitudinally speaking, that college 
educators seem to have long taken for granted. Each has been 
subject to a unique schooling of attitude conditioning in the 
family, church, neighborhood, and in pre-college education. Some 
have relatively fixed and naive opinions on topics about which 
they are usually not yet competent to express an opinion; others 
are more flexible and sophisticated and “liberal” in their attitudes. 
It is this sort of psychological trait that stereotypy testing is 
designed to measure. 

The development of the stereotypy syndrome and its measure- 
ment are of quite recent origin. The monumental study, The 
Authoritarian Personality, was the first large-scale and system- 
atic attempt to evaluate factors concerned with stereotyped con- 
ditioning, particularly where it concerned racial discrimination. 
This work, both because of the importance of its subject content 
and its methodological analysis, stimulated further work in other 
phases of stereotyped conditioning, most noticeably as it applied 
to political attitudes.? Its implication for the educational field, 
however, had not been missed and within the past five years a 
considerable amount of work, primarily in the development of 
valid tests of stereotypy, has been done. It is interesting to note 
that the first public presentation dealing with the relevance of 





*T. W. Adorno, et al., The Authoritarian Personality (Harper, 1950). 
Credit, of course should be given also to the earlier studies relating to 
the theory and measure of personality syndromes conducted by Henry 
Murray and his associates at the Harvard Psychological Clinic. H. Murray, 
et al., Explorations in Personality, (Oxford, 1938). 

*See especially: Studies in the Scope and Method of the “Authoritarian 
Personality,” edited by Chrisite and Jahoda (Harper, 1950). This book 
provides a critique of The Authoritarian Personality and, in addition, de- 
velops some interesting modifications and extensions of ideas contained in 
the earlier work. 
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stereotypy to the learning process dates only from 1952,3 while 
the earliest devised test for measuring stereotypy in students, 
that of the American Council on Education, appeared in 1951. 
Since that time “Inventory of Beliefs” tests, essentially adapta- 
tions of the American Council on Education test, have been used 
at a number of institutions, notably at the University of Chicago. 
During the past three years at Northland College the Form T 
test (used at the University of Chicago) has been given to students 
in a required junior-year philosophy course by Professor R. E. 
Egner, and the combined results analyzed and significant cor- 
relations examined. The results of these efforts will be briefly 
described in the following paragraphs. 

The Form T test used in this study is composed of an inventory 
of one hundred beliefs, of which sixty are common stereotyped 
expressions. The examinee is asked to indicate quickly his ac- 
ceptance or rejection of the statements. No way is provided for 
indicating a neutral position and all items must be answered. 
The test is designed to indicate only the direction in which the 
personality has been conditioned to move. The stereotyped per- 
sonality will show a marked tendency to accept the stereotyped 
beliefs and to reject the non-stereotyped statements; with the 
non-stereotyped personality, the two tendencies are reversed. A 
sample of the beliefs taken from the Form T test are shown be- 
low together with the appropriate key. 

(A) I strongly agree or accept the statement. 
(B) I tend to agree or accept the statement. 
(C) I strongly disagree or reject the statement. 
(D) I tend to disagree or reject the statement. 
(1) Literature should not question the basic moral concepts 
of society. 
(2) Any man can find a job if he really wants to work. 
(3) Science will eventually explain the origin of life. 
(4) No task is too great or too difficult when we know that 
God is on our side. 





*S. Goldberg, and G. Stern, “The Authoritarian Personality and General 
Education.” Paper presented at the 60th Annual Convention of the American 
Psychological Association, September, 1952, Washington, D. C. Abstracted. 
A recent and more complete study dealing with problems of stereotyped 
behavior in college students may be found in chapter 10 of Methods of 
Personality Assessment by Stern, Stein and Bloom (Free Press, Glencoe, 
1956). 
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(5) A sexual pervert is an insult to humanity and should be 
punished severely. 

(6) The twentieth century has not had leaders with the vision 
and capacity of the founders of this country. 

(7) Other countries don’t appreciate as much as they should 
all the help that America has given them. 

(8) No censorship on the presumed morality of books and 
movies can be justified. 

(9) Our rising divorce rate is a sign that we should return 
to the values which our grandparents held. 

(10) Most intellectuals would be lost if they had to make a 
living in the realistic world of business. 

This test was administered over a period of three years to a 
total of two hundred and eighteen students enrolled in an intro- 
ductory philosophy course (junior level) which was required of 
all students at the college. The samples upon which the subsequent 
analysis is based represent students from this total who scored 
at the extremes on the test and who were also equated for intel- 
ligence.* Twenty low and twenty high scorers, composing the 
two samples, were selected for further investigation. The problem 
was to determine if any significant correlation existed between 
high and low scoring on the stereotypy test and performance in 
the various subject areas. If one interprets the significance of 
stereotyped conditioning on learning ability in terms of the num- 
ber of successful students in each sample the differences in per- 
formance between the two groups readily becomes apparent. The 
figures in parentheses indicate the number of students in each 
sample receiving a grade of “C” or above in the various areas 
of learning. 

Humanities Stereotype (12) Non-Stereotype (19) 

Social Science Stereotype (11) Non-Stereotype (17) 

Biological Science Stereotype (18) Non-Stereotype (17) 

Mathematics Stereotype (17) Non-Stereotype (12) 

Natural Science Stereotype (16) Non-Stereotype (12) 

It is clear that the non-stereotypes performed appreciably better 
in the Humanities and in the Social Science subjects, while in 
Mathematics and in the Natural Sciences the performance of the 





“In both samples students were selected whose performance on the 
A.C.E. Psychological test were- approximately the same—individual scores 
were located within a range of eight centiles. 
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stereotypes was superior. In the Biological Sciences both groups 
did about, equally well.’ In view of the observation that a higher 
percentage of stereotypes tends to drop out in the first two years 
of college it is quite possible that a selective element is present 
in the sampling. Any adjustment for this, of course, would im- 
prove the showing of the non-stereotype group. 

The above results, in spite of the small sampling taken, are 
closely in line with the observations made from similar studies 
at other institutions. The student most likely to succeed in courses 
which involve relatively high levels of flexibility of outlook is the 
non-stereotype. As one moves from the concrete to the abstract 
a larger percentage of stereotyped students begins to manifest 
evidence of frustration, culminating in more frequent academic 
failure, once their threshold of tolerance has been exceeded. Fig- 
ures would most likely underestimate their rate of failure since a 
substantially larger number of stereotypes withdraw from college 
before eventual failure. According to the University of Chicago 
examiner, Dr. Benjamin S. Bloom, about one fourth of the stereo- 
typed students drop out of the College of General Education by 
the end of their first quarter. The reasons given for these with- 
drawals provide a clue to the kind of atmosphere in which the 
stereotype thrives. Some of the most common follow: The educa- 
tion was not very practical; assignments were never very definite; 
the discussion technique represented a poor educational method 
(apparently because it lacked an authoritarian teacher) ; definite 
and final answers to problems in the humanities and the social 
sciences were never given. 

An educator who is reasonably alert to the performances of 
his individual students will have little difficulty in recognizing 
the presence of stereotyped attitudes and opinions among his 
students. Evidence can be found in abundance in written examina- 
tions, particularly in the essay type where the student enjoys 
more or less free reign in putting down his thoughts. Oral re- 
sponses to questions and private discussions with the instructor— 





*°In Humanities and Social Science the differences beween the groups are 
significant at the 5% level; in Mathematics the difference is just below statis- 
tical significance. 

* From an address delivered at the Workshop in Higher Education, North 
Central Association Study of Liberal Arts Education. University of Chicago. 
Summer, 1953. | 
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indeed, any form of communication—will reflect these attitudes. 
As long as the stereotype is communicative, and it is probably the 
case that as a group they are more inclined to be so, his char- 
acteristics can be readily observed. The “true” stereotype is 
entirely convirfced, at least on an emotional plane, of the cor- 
rectness of his views and so does not hesitate to express them. 
This is true of the “left” and “right” stereotypes; both insist on 
the correctness of their respective positions which are located at 
the extremes of the continuum line. To take an example from the 
economics class, a “left” stereotyped student will tend to view 
the capitalist system as one characterized by a highly unequal 
distribution of income and unequal economic opportunities for 
its members, and exploitation of the working classes by those in 
economic power. Those on the “right” consider the institution of 
private property a sacred mainstay of freedom, both economic 
and political, with inequalities a necessary driving force in the 
economy, and view the influence of labor as baleful to freedom 
and restrictive of the natural working of the economic system. 
It would not be difficult to find comparable examples of “left” 
and “right” stereotypy in the other academic disciplines. 

This “black” and “white” mentality of the stereotyped student 
can be so easily recognized (at least by an instructor who does 
not share the same species of stereotyped attitude) that a ques- 
tion very quickly arises concerning the usefulness of an elaborate 
test to detect its presence. The test clearly is of greatest use 
where large numbers of students are involved whose attitudes are 
as yet unknown. Under these circumstances it is convenient and, 
at the same time, it permits a rough quantitative appraisal of 
the degree of stereotypy present. At a large institution it could 
be given as part of a battery of tests and, if desired, correlations 
among performances on the stereotypy and other tests could be 
readily found. In the smaller school, where classes are generally 
limited and contact between students and instructor more in- 
timate, the value of the test is considerably diminished. 

An area which has received little attention is the matter of 
“de-stereotyping” the stereotyped student. One might think that 
this is a task for the psychological advisor, yet changing fixed 
and inadequate beliefs of students is a constant function of educa- 
tion. The extreme stereotype who holds to irrational beliefs with 
an obsessive force might, perhaps, be put down as a case in 
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pathology. But the more common stereotype presents a problem 
for the educator which differs only in degree from that which is 
faced in the ordinary affairs of education. Viewed in this light, 
the stereotype becomes simply a more intractable case to handle, 
though still amenable to the usual techniques of effective teach- 
ing. “Bare” facts, served in imposing dosages, cannot be ignored 
by the student for very long, and if the patience of the instructor 
is not lacking it should be a matter of time before a strongly 
entrenched stereotyped attitude begins to break down. Removal 
of the student from the emotional environment fostering the 
stereotyped conditioning would, of course, contribute a great deal 
to the progress. 

The entire problem of stereotypy has such broad implications 
for society as a whole that it seems superfluous to attempt to 
offer further justification for our concern with this subject. Studies 
of the authoritarian personality, which deal with a rather specific 
form of stereotypy, give adequate evidence that the fate of our 
democratic institutions themselves is at stake in this issue. The 
work now being done on stereotypy and the learning process 
should convince educators that they are in a more potent posi- 
tion to advance the cause of democratic society than might have 


appeared. 





LOGICAL VERSUS EMPIRICAL SCORING 
KEYS: THE CASE OF THE MTAI 


N. L. GAGE 


Bureau of Educational Research 


College of Education, University of Illinois 


The Minnesota Teacher Attitude Inventory (MTAI) (1), has 
scoring weights which are non-monotonically related to degree of 
agreement with the item. This is true of ninety of the MTAI’s 
one hundred fifty items. That is, the same scoring weight is as- 
signed to responses on both sides of the “undecided” position, or 
different scoring weights are assigned to responses on the same side 
of the “undecided” position. Such items can be identified wherever 
the 0 weight falls somewhere other than between the +1 or —1 
weights, or wherever two or more +1 or —1 weights are in non- 
adjacent alternatives. (Non-monotonic weights would not be so 
puzzling if they proceeded +1, —1, —1, —1, +1, or —1, +1, +1, 
+1, —1. There are, however, no items like this in the MTAI.) 

For example, consider Item 1, “Most children are obedient.” 
The scoring weights for this item are SA, +1; A, —1; U,O; D, —1; 
SD, 0. It is difficult to make psychological sense out of such scor- 
ing weights. The general direction or trend of the weights for all 
items does indeed assign positive weights to a permissive, child- 
centered orientation. But the non-monotonic weights defy any 
such conceptualization. And they imply that dimensions other 
than degree of agreement with the items are relevant to the test’s 
purpose. 

Two defenses might be offered for these non-monotonic weights: 
(a) they tend to defeat attempts at faking; (b) the scoring weights 
are based on empirically determined differences between the re- 
sponses of one hundred very good and one hundred very poor 
teachers. 

Both these defenses can be pierced by available evidence. Thus, 
the MTAT has indeed been found to be fakable* (4), despite the 
complex and apparently illogical scoring weights of many of the 
items. Second, experience has shown (3, p. 450) that very large 





* At least in one sense. 
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samples (of the order of five hundred cases) are needed before 
empirical scoring weights become stable. Hence, it would be sur- 
prising if the MTAI’s non-monotonic weights, based on only one 
hundred teachers in each extreme group, were not merely reflecting 
random fluctuations due to sampling error. 

We can therefore hypothesize that the non-monotonic, empiri- 
cally derived scoring weights have lower validity than a logical 
set of scoring weights. To test this hypothesis, two “logical” scor- 
ing keys were made: (a) a dichotomous-logical key in which both 
choices on a given side of “undecided” were weighted +1, and the 
remaining three choices were weighted 0; (b) a trichotomous- 
logical key, in which the same choices as in a were weighted +1, 
but the two choices on the other side of “undecided” were weighted 
—1. In each instance we weighted +1 that side of the response 
continuum which contained the +1 weight in the published key. 
The following item illustrates the three keys whose validities could 
be compared: 


1. Most children are obedient. SA A U D SD 
Published key +1 —1 0 —1 0 
Dichotomous-logical key +1 +1 0 0 0 
Trichotomous-logical key +1 +1 0 —1 —|] 


We used data obtained in another study (2) involving ninety- 
seven of the ninety-eight teachers and all their twenty-seven 
hundred pupils in grades four to six of a midwestern city. Our 
criterion was the same as that used, in part, by the authors of the 
MTAI. This was the mean of the ratings given the teacher by 
her pupils on Leeds’ “My Teacher” rating scale. 

Table I shows that the split-half reliabilities of the scores 
obtained with the two “logical” keys were higher (0.93 and 0.94) 
than that of scores obtained with the empirical key (0.90). Hence, 
we do not need to depend on the empirical key to measure some- 
thing with the MTAI that is consistent over items. 

The validity coefficient of the trichotomous-logical key was 
higher than that of the published empirical key: 0.28 as against 
0.26. The important point is not that the logical key had a slightly 
higher validity. Rather it is noteworthy that it did not have lower 
validity. 

These findings have both practical and theoretical importance. 
In future use of the MTAI, the published empirical key should be 
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TaBLE I.—RELIABILITY AND VALIDITY OF THREE TYPES OF 























MTAI Score 
N = 97 
M S.D. ~ | 6. 

Criterion (Mean rating by pupils on} 20.7} 10.4 | .91* 

‘““My Teacher’’) 
Published empirical key for MTAI 56.7 | 35.5 | .90** . 26 
Dichotomous-logical key for MTAI 111.8 | 17.1 | .93** .23 
Trichotomous-logical key for MTAI 88.2 | 28.9 | .94** . 28 

‘ Horst (4). 


** Corrected split-half. 


reévaulated in comparison with our “logical” key; it seems prob- 
able that the empirical validity of the MTAI can be improved 
by using a logical scoring key. 

Of even greater potential significance is the implication that 
what the MTAI measures can be conceptualized in terms of rela- 
tively simple and communicable theory. If we can abandon the 
complex and arbitrary-seeming published scoring weights for the 
items, the scoring key becomes readily comprehensible and pre- 
dictable. Thus, we have found that three graduate students in 
educational psychology—instructed to assign scoring weights in 
accordance with a permissive, child-centered, non-authoritarian, 
and emotionally secure orientation—were able to predict the logical 
scoring weights unerringly. 


SUMMARY 


The Minnesota Teacher Attitude Inventory had slightly higher 
reliability and validity when scored with a logical key than when 
scored with the published empirically derived scoring weights, 
which are non-monotonic for ninety of the one hundred fifty items. 
Practically, this suggests reéxamination of the published key. 
Theoretically, this means that the MTAI’s meaning becomes 
relatively easy to formulate. 
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STABILITY OF INTRA-INDIVIDUAL 
PATTERNING OF MEASURES OF 
. ADJUSTMENT DURING 
ADOLESCENCE 


FRED T. TYLER 


University of California 


INTRODUCTION 


According to certain theoretical positions, as indicated in vari- 
ous textbooks, there is a high degree of constancy and consistency 
in much of the individual’s behavior. Empirical evidence on this 
point is relatively scarce since problems related to the behavior 
of the individual over a period of time have received much less 
attention than have those dealing with the behavior of groups of 
individuals at some specific point of time. Yet understanding and 
prediction of an individual’s behavior depend upon knowledge con- 
cerning stability and consistency of his behavior: stability in the 
behavior of a group of subjects is no guarantee of stability in each 
member of the group. 

The importance of questions dealing with the stability of per- 
sonality is apparent: prediction is feasible if behavior is consistent 
over a period of time. According to Olson, prediction is possible 
when growth is continuous: “The fact that there is continuity to 
the growth process makes it possible to project curves and to pre- 
dict” (6, p. 36). However, the present writer believes that predic- 
tion depends more upon stability and consistency than upon con- 
tinuity of growth, since, in our opinion, growth may be continuous 
even while being irregular and erratic. Olson appears to accept the 
concept of stability of personality: “The stability of personality 
results in a strong resistance to displacement by temporary en- 
vironmental variations” (6, p. 276). 


ANALYSIS OF THE INDIVIDUAL 


Analyses of physical and behavioral data of the single case have 
been, in the past, primarily graphical in nature, although other 
procedures are beginning to appear. Kerlinger (5) employed an 
analysis of variance technique to investigate the validity of the 
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principle of unity of growth, or the concept of “homogeneous” and 
“heterogeneous growers,” to use his terms. However, the “sub- 
tractive” scores he used in his analysis do not seem appropriate 
since consideration of his methodology shows that the use of “sub- 
tractive” scores may indicate the presence of heterogeneity of 
growth when in fact the growth ages imply homogeneity ; and vice 
versa. Tyler (9) illustrated correlational [P-technique (1)] and 
factorial analyses of physical data for the individual subject. The 
effects of “staggering” longitudinal data to take into account the 
fact that growth in different physical characteristics is sequential 
rather than synchronous in nature are also under investigation by 
means of correlational analyses. Analysis of variance of a variety 
of behavioral data expressed in standard scores has also been used 
to investigate the validity of the principle of unity of growth.’ 

Spearman’s rho and Kendall’s W also appear to offer possibili- 
ties for the statistical analysis of data from the individual case. 
The present writer recently reported, for a series of behavioral 
measures, the values of rho between the measures obtained at two 
ages in the adolescent period for each of sixteen boys.* These rhos 
were 0.74, 0.71, 0.67, 0.63, 0.60, 0.54, 0.52, 0.46, 0.44, 0.42, 0.33, 0.09, 
0.08, and —0.42, indicating a tendency for some similarity in the 
patterning of the scores on the two occasions for some subjects, 
but not for others; the degree of agreement between measures 
having reference to two occasions varies from one individual to 
another, and even the largest rhos imply some considerable disa- 
greement among the rankings of the characteristics at different 
ages. 

The concept of unity among measures of the individual does not 
seem to allow for the possibility of individual differences in the 
degree to which such measures are idiomatic within the individual. 
However, it seems certain that there are individual differences in 
the extent to which the individual’s pattern of adjustment remains 
constant during the adolescent years. 

The data to be analyzed were obtained by means of the U.C. 
Inventory I, dealing with social and emotional adjustment, which 
was administered annually to the members of the California 
Growth Study (3). It is a self-report type of inventory, the items 
of which are organized around eight areas of categories of adjust- 





* Manuscripts dealing with these analyses are in preparation. 
* Paper read at the C.E.R.A. meeting, March, 1955. 
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TaBLe I.—RELIABILITY COEFFICIENTS FOR CATEGORIES OF 




















ADJUSTMENT 
: - iabili 
Category Tie | “eae 
; Occasion Occasion 
Name 
4 1 2 3 
Social maladjustment .74 71 81 .77 
Personal inferiority .68 .62 .60 45 
Overstatement 71 .64 .58 .52 
Family maladjustment 71 44 .40 .80 
Physical symptoms .75 74 .68 .72 
Fears .70 .69 72 91 
Generalized tensions .76 .78 .73 .76 
School maladjustments .67 .88 .92 91 











ment. Their designations and reliability coefficients are shown in 
Table I (7, 8). 

Our questions may be formulated as follows: 

(1) What is the degree of agreement within an individual among 
seriatim measures of adjustment obtained during adolescence? 

(2) Is the agreement significant for some subjects but not for 
others? 

(3) Is the agreement sufficiently large to permit reasonably 
accurate predictions from earlier to later ages of the individual? 

(4) Is the degree of agreement related to the individual’s gen- 
eral adjustment? 


THE ANALYSIS 


The nature of the questions to be considered and the method 
of analysis to be employed will be described by means of the data 
in Tables II and III; the measures are standard scores (based upon 
grade groups) for each variable at each chronological age. We may 
determine the extent to which each category of adjustment occu- 
pies the same rank among the other categories at each of several 
ages. Measures of the relationships between the ranks at different 
ages may be computed by means of Spearman’s rank-order 
method, as for the data in Tables II and III.* 

The values of rho for the three possible pairings of ages are 0.55, 





*Case 134 in the California Adolescent Growth Study. 
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TaBLe II.—STanparp ScorEs ON THE CATEGORIES OF THE U. C. 


INVENTORY I BY CHRONOLOGICAL AGE 











(Case 134) 
Category of Adjustment 
CA 
1 2 3 4 5 6 7 8 
11 51 20 50 36 33 72 46 45 
13 76 61 57 54 45 63 70 66 
17 71 44 49 48 43 72 54 48 





























TaBLe III.—RANK Orperrs or STANDARD Scores IN TABLE II 











CA 1 2 3 4 5 6 7 x 
11 2 8 3 6 7 1 4 5 
13 1 5 6 7 8 4 2 3 
17 2 7 4 5.5 8 1 3 5.5 
Sum 5 20 13 18.5 23 6 9 13.5 





























0.96, and 0.67, implying a tendency for the variables to occupy 
similar ranks at different ages. [This type of coefficient of corre- 
lation is termed 0-technique by Cattell (1)]. 

The number of rhos to be computed increases rapidly as the 
number of ages for which data are available increases, so that the 
amount of computation increases rapidly with an increase in the 
number of rankings. In any case, a more general question, leading 
to a more general answer, and involving a limited amount of cal- 
culation, might be asked: Is there agreement among the ranks 
assigned to each category of adjustment for the three ages con- 
sidered? 

Kendall (4) proposed his measure of concordance, W, as a suit- 
able technique fer answering such a question. If W is found to be 
significant, we conclude that significant agreement exists among 
the patternings of the variables at all ages. We may then infer 
that there is stability in the pattern of an individual’s adjustment 
(although not necessarily in the level of adjustment) as he grows 
older. The value of W may be computed from data arranged as 
in Table III, in which the sums of the columns, required for the 
computation of W, are indicated. 
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The coefficient of concordance is defined as follows (2): 


W = Sum of squares between columns 
Total sum of squares 


(5° + 20° --- 9° + 13.5") _ 3 x 8(9)’ 
3 4 





The between variance = 








m(n’—n) _ 3(8° — 8) 
—— ee 


(where m is the number of rows and n the number of columns). 
Then, for the data of Table III: 


The “between” variance = 102.17 


and the “total” variance = 126 


The total variance = = 126 














102.17 
so that W = —— =0.81 
The significance of W may be determined by means of the ,” 
test: 
2 _ (n — 1) (sum of squares between columns) 
on (n* — n)/12 

_ 7X 102.17 _ 
"“@-/5:... 


where there are (n-1) degrees of freedom. 

For our example, x? = 17.03, which just fails to reach signifi- 
cance at the 0.01 level. We conclude that there is a high probability 
that the agreement among the patterns of adjustment at different 
ages is not due to chance. 

It might also be of interest to know the mean value of all the 
rank-order coefficients of correlation in our m X n table. This 
mean value may be determined from the following formula (2): 


-1_mW—-1_3xX081~1 


a = AO = 0.72 





In our example, the value of 0.72 is almost identical to the mean 
value of the three rank-order coefficients already reported for 
Table III. 
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TABLE IV.—DISTRIBUTION OF VALUES OF W 





Interval Number of W’s 





.80-.89 
.70-.79 
.60-. 69 
.50-. 59 
-40-.49 
.30-.39 
. 20-. 29 
.10-.19 
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Similar computations were made for thirty boys, all members 
of the California Adolescent Growth Study. For twenty-eight boys, 
eight categories of adjustment were available for ranking on each 
of seven annual occasions (i.e., chronological ages); measures 
were known for each of six occasions for the other two subjects. 
The value of W for significance at the 0.01 level is 0.38 for the 
7 X 8 Table, and .44 for the 6 x 8 Table (W was corrected for 
continuity and for tied ranks, although these corrections resulted 
in but minor changes in the value of W). 

The distribution of the values of W is shown in Table IV. About 
50 per cent of the W’s are significant at the .01 level. We conclude 
that there are individual differences in the presence of significant 


TABLE V.—DISTRIBUTION OF VALUES OF THE MBEAN OF RANK-ORDER 
CORRELATIONS 





Interval No. of Mean Rhos 





.80-.89 
.70-.79 
.60-. 69 
.50-. 59 
.40-.49 
.30-.39 
. 20-. 29 
.10-.19 
.00-.09 
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agreement among the rankings of the categories of adjustment 
being considered. 

The mean values of all rank-order coefficients of correlation for 
each subject were also computed according to the formula already 
indicated. The‘ distribution of mean rhos is shown in Table V. 
Again, it is obvious that there are individual differences in the 
degree of agreement among the profiles at different chronological 
ages. Further, it appears that the prediction of later from earlier 
profiles must be accompanied by the possibility of large errors. 
The patterning of the categories of adjustment measured by the 
U. C. Inventory I does not confirm a hypothesis of unity of 
growth or of homogeneity of growth for these subjects. 


DISCUSSION 


It may be that the non-significant values of W are the results 
of changes in the rank orders of the categories in a profile that 
arose from the unreliability of the measures. If this is so, we might 
expect that the differences between the standard scores for suc- 
cessive ranks in each profile will be less in the cases of those sub- 
jects with the non-significant W’s than of the other subjects. We: 
determined the distribution of these differences for each of twenty 
subjects, ten with significant W’s and ten with non-significant 
W’s. The median value for each distribution was then computed; 
these are shown in Table VI. 

It is seen that the medians are very similar for the two groups; 
that is, the differences between the standard scores for consecutive 
ranks on any occasion may be as small in the case of the significant 
as of the non-significant W’s. The absence of significant agreement 
among the categories does not appear to be determined solely by 
unreliability of the measures of adjustment. Some might argue, 
however, that the presence of several non-significant values of W 


TaBLE VI.—MeEpIAN VALUES OF DIFFERENCES BETWEEN STANDARD 
ScorREs REPRESENTED BY CONSECUTIVE RANKINGS FOR TEN SUBJECTS 
WITH SIGNIFICANT W’s AND TEN WITH NON-SIGNIFICANT W’s 








Group Median Values 
With significant W’s 3.0 3.5 3.9 4.0 4.0 4.0 4.1 4.5 5.0 7.0 
With non-significant 3.3 3.8 3.9 4.0 4.1 4.3 4.3 4.5 4.6 5.2 
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may be a reflection of unreliability of the measures of adjustment 
(see Table I). 

We may now consider the fourth question that was proposed 
earlier, namely, is general adjustment related to consistency of 
adjustment over a period of years? The subjects of the California 
Adolescent Growth Study were rated by each of three psycholo- 
gists at the end of the school period on each of five categories of 
adjustment, known as “manifest traits,” as follows: 

(1) Social prestige 

(2) Popularity with same sex 

(3) Work adjustments 

(4) Heterosexual adjustments 

(5) Personality adjustment 

The subjects were rated on a seven-point scale, the final rating 
being the average of the ratings by the three psychologists. Rat- 
ings on these five “manifest traits” were available for twenty of 
our thirty subjects, nine having non-significant W’s and eleven 
having significant W’s. The ratings of each subject on each trait 
are shown in Figure I; the mean ratings are also indicated for each 
group. Successful adjustment as rated on the “manifest traits” 
does not appear to be related to significant agreement among the 
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Fiaure 1. Ratings on “manifest traits’; those with nonsignificant W’s 
represented by the open circles. 
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categories of adjustment measured by the U.C. Inventory I during 
the adolescent years. Stability in the patterning of the individual’s 
self-concept is not related to the psychologists’ concept of the sub- 
ject’s adjustment as evaluated by them at the end of the school 
period. (The rank-order correlations between the value of W and 
the ratings on each of the five “manifest traits” for the twenty 
subjects were 0.07, 0.16, 0.09, 0.16, and —0.03.) 


SUMMARY 


The patterns of adjustment, as revealed by a self-report inven- 
tory administered annually to the subjects of the California Ado- 
lescent Growth Study, were analyzed by means of Kendall’s W 
to determine the degree of agreement in the patterning of measures 
on eight categories of adjustment on each of seven annual occa- 
sions. The value of W was found to be significant for one half of 
the subjects. The presence of non-significant W’s in some subjects 
did not seem to arise simply from the unreliability of the measur- 
ing instruments since the median differences between the standard 
scores of consecutive ranks within each profile were about the 
same for the subjects with significant W’s as for those with non- 
significant W’s. 

The presence of stability in the patterning of the individual’s 
self-concept on the eight categories of adjustment was not related 
to successful adjustment as judged by three psychologists at the 
end of the school period. 

It is believed that the nature of the results reported is not favor- 
able to the hypothesis of a high degree of stability in the adoles- 
cent’s reported self-concept. There are individual differences in 
the degree of stability; in all cases, prediction of later from earlier 
measures of adjustment would result in some considerable degree 
of error. 

There remains, of course, the problem of inferring the nature 
of the individual’s behavioral characteristics from a knowledge of 
the changes in the scores on a self-report inventory. It may be ar- 
gued that the individual’s pattern of adjustment is, in reality, more 
stable than the profiles imply. Changes in the patterns may reflect 
changes in expressions of adjustment and maladjustment. An ado- 
lescent’s relationships within his family may become so unbear- 
able that he comes to the point of indicating good adjustment in 
the family and at the same time changes from indications of good 
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to poor adjustment in school. Or, again, he may indicate a change 
from unsatisfactory to satisfactory adjustment in school, but at 
the same time his “generalized tensions” may become much more 
prominent in his self-report inventory. 

The data analyzed and reported here indicate that the pattern 
of the adolescent’s social and emotional adjustment, as measured 
by the U.C. Inventory I, may be quite erratic during a seven-year 
period. A different type of analysis may provide evidence relative 
to the problem just discussed. 
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PUPIL GROWTH IN READING—AN INDEX 
OF EFFECTIVE TEACHING 


HAROLD E. MITZEL and DONALD M. MEDLEY 


Division of Teacher Education, Municipal Colleges of New York City 


The fact that much of the recent sharp criticism of modern pub- 
lic education has centered on the teaching of reading reflects the 
importance that the general public attaches to this phase of in- 
struction. The volume of study and experimentation devoted to 
methods and materials of teaching reading indicates that educa- 
tors, too, attach great importance to this part of the school cur- 
riculum. One important phase of the problem seems to have been 
neglected, however—the réle of the teacher in the teaching of 
reading. A recent review of quantitative studies attempting to 
evaluate teacher effectiveness reveals few that employ reading 
growth of pupils as a criterion of effectiveness (5). The general 
conclusion to be drawn from these few studies is that little or noth- 
ing is known about what behavioral and personality characteris- 
tics distinguish the effective teacher of reading from the ineffective 
one. 

Any attempt to identify such characteristics must begin with 
the development of a reliable criterion measure of the effectiveness 
of the teacher on the job to which measures of behavior can be re- 
lated. This report describes the development of such a criterion. 

The chief difficulty that arises when a measure of a teacher’s 
effectiveness is sought in the growth scores of her pupils is the 
difficulty in separating the teacher’s contribution from all of the 
many other factors which affect the scores. Unless a study is re- 
stricted to the teachers in a single school, thereby losing most of 
its generalizability, different teachers will teach in different schools 
located in different communities or neighborhoods. And, if ele- 
mentary-school teachers are being studied, no two teachers will 
teach the same group of pupils. If the mean growth of one teacher’s 
class in one school is greater than that of another teacher’s class 
in another school, it could be argued that this difference results 
from differences in schools or neighborhoods, and from differences 
in pupils, and that perhaps none of the difference can be attributed 
to a difference in teachers. Suppose one teacher has a high-ability 
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group in a new, well-equipped building in a middle-class neighbor- 
hood; another has a “slow” section in a superannuated building 
in a slum area. Can these teachers be compared on the basis of 
their pupils’ progress? 

Allowing for school differences. The problem of school differ- 
ences can be handled by including in the sample at least two teach- 
ers who teach in each school used in the study, and comparing 
each teacher only with teachers in her school. This has been 
achieved in the present study by using an analysis of variance 
with differences between teachers in the same school separated 
from differences between teachers in different schools. It is pos- 
sible to test the latter against the former. If they are not greater, 
school differences may be ignored; if they are greater, comparisons 
among teachers should be made only within schools. 

Controlling class differences. The best way of dealing with class 
differences among pupils is to assign pupils in the same school to 
different classes at random. If this is done, then it is safe to con- 
clude that any difference between pupils in different classes in ex- 
cess of that between pupils in the same class is the result of teacher 
(or school) differences. 

Unfortunately, in this study as in most such studies, the assign- 
ment of pupils to classes was not under the control of the investi- 
gators. It was necessary to take each teacher with whatever class 
had already been assigned to her. Naturally, not all the teachers 
were assigned to the same grade; and within a grade, some teachers 
had high-ability groups, some low-ability groups, and some even 
had the local equivalent of a “special” class. 

Clearly, there are two important sources of variation among 
classes which are of primary concern: past achievement and apti- 
tude for learning. It was decided that these two variables would 
be controlled by the technique of covariance analysis. Each pupil 
was tested in reading and in mental maturity at the beginning of 
the school year, and in reading again at the end of the year. 

A note on covariance technique. For those unfamiliar with the 
analysis of covariance, a brief word of explanation may be helpful. 
The relative amount of reading ability possessed by a pupil is in- 
dicated by the deviation of his score on a reading test from the 
mean of all of the pupils’ scores. A reading test administered to a 
pupil at the end of a school year measures how much reading abil- 
ity (in comparison with other pupils) he has acquired in his entire 
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school career up to that time. Similarly, a pupil’s score on a reading 
test at the beginning of the year indicates how much he has ac- 
quired before coming to his present class. His score on an inteili- 
gence test administered at the same time as the initial reading 
test is a partial measure of his aptitude for learning. The covari- 
ance technique predicts, on the basis of performances of all the 
pupils at the beginning of the year, what score each pupil may be 
expected to achieve at the end of the year, and then compares the 
score he actually obtains with that score. It is the pupil whose 
final score is farthest above his own expected score whom this 
method regards as having the highest score, instead of the pupil 
farthest above the group mean. 

In other words, the effect of application of covariance adjust- 
ments is to start all pupils off as though they had all had the same 
score on the initial reading test and on the intelligence or mental 
maturity test. 

Other factors. It should be recognized that there may be other 
differences among these classes (average daily attendance, for in- 
stance) that are not measured by the initial tests. Such variables 
have not been controlled in this study, and remain to contaminate 
teacher differences, masking to some extent the real differences 
among teachers. It is necessary to assume that the effect of such 
variables is negligible. Since, so far as could be ascertained, these 
variables were not used as a basis for assigning pupils to classes, 
this assumption is not unreasonable. 


COLLECTION OF DATA 


Sample. In the fall of 1954, a follow-up study was made to locate 
those of the one thousand six hundred twenty-eight teachers in the 
student teaching class of 1953-54 from the municipal colleges who 
were teaching either as regular teachers or permanent substitutes 
in New York City public elementary schools. Each teacher who 
was assigned to teach in grade three, four, five, or six in a school in 
which there was at least one other such teacher was asked to par- 
ticipate in the part of the project under discussion. 

Of the seventy-five teachers who met these criteria, nineteen 
were eliminated either because the schools were inaccessible, be- 
cause the principle or one of the teachers \in a school did not wish 





*For an analysis of the implications of such assumptions and what it 
means if they are not fulfilled, see Lindquist (3). 
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to codperate, or because one of them failed to accept a proffered 
teaching position. Participation in the study was voluntary, but 
partly because of the rather heavy pressure exerted by district 
superintendents and principals on each teacher to participate, the 
number of teachers omitted at their own request was small—not 
greater than three or four. Two principals failed to codperate; one 
was the victim of a sudden increase in school population and felt 
that he had so many problems as a result that he was justified in 
dropping out; the other maintained that the additional burden on 
a beginning teacher was too great to justify her even being asked 
to cooperate. 

Of the fifty-six teachers whose classes were tested in the fall, 
seven were dropped from the study during the year. Whenever one 
teacher in a school with two candidates in it withdrew, the other 
teacher was perforce also lost. Table I shows the distribution of the 
sample by schools and grades. 


TaBLE I.—GrRaADE LEVELS TauGuT BY 49 TEACHERS IN 19 ScHOOLS 














Grade Level Taught 
School No. Total 
Ill IV V VI 

1 1 1 2 
2 2 2 4 
3 1 1 2 
4 1 1 1 3 
5 2 2 
6 2 2 4 
7 1 1 1 3 
8 1 1 1 3 
i) 2 2 
10 2 2 
11 1 1 2 
12 2 2 
13 2 2 
14 1 1 2 
15 1 1 2 
16 1 1 2 
17 1 2 3 
18 4 1 5 
19 1 1 2 
Totals 23 13 9 4 49 
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Test data were incomplete at the beginning of the analysis in 
four classes, so that the covariance analysis to follow was based on 
forty-five teachers. Three of these forty-five teachers were men. 
Except for these selective factors, and the requirement that there 
be two or more new teachers in each school, the sample appears to 
be representative of beginning teachers in New York City public 
elementary schools. 

Tests used. To measure the reading ability of pupils the authors 
selected four sub-tests of the California Reading Test (Elemen- 
tary) —Word Form, Word Recognition, Meaning of Similarities, 
and Interpretation of Meanings. These four sub-tests have a total 
of eighty-seven test items. Three sub-tests entitled Meaning of 
Opposites, Following Directions, and Reference Skills, were omit- 
ted in order to shorten the administration time. The California 
test was chosen as the reading measure for the investigation for 
the following reasons: 

(a) It seems to measure aspects of reading ability consistent 
with the instructional objectives of the New York City 
school system. 

(b) It offers coverage of a wide range of reading skills with 
norms given for grade equivalent scores from 2.0 to 9.8. 

(c) Equivalent forms are available. 

(d) The format does not require complicated marking behav- 
iors on the part of the pupils being tested. 

To estimate the learning capacity of the pupils in this study 
the 1950 Short Form of the California Test of Mental Maturity, 
Non-Language Section was used. The non-language section, con- 
sisting of four sub-tests totaling sixty items, was chosen for the 
following reasons: 

(a) It makes small demand on pupils’ reading ability. 

(b) It is convenient to administer during a single school period. 

(c) It has a range of mental age scores from 5.0 to 20.0. 

(d) The format does not require complicated marking behav- 
iors on the part of the pupils being tested. 

The manner in which the data were collected implies the fol- 

lowing major assumptions: 

(a) That the four portions of the California Reading Test pro- 
vide adequate coverage of the reading objectives of New 
York City teachers. 

(b) That the pupils in each class who were present at both ini- 





LiS#earsase . 


OF MICHIGAnN 


UNIVERSITY 





232 The Journal of Educational Psychology 


tial and final test administrations are representative of the 
whole class taught by each teacher. 

These assumptions are made in addition to the usual assump- 
tions underlying the statistical techniques applied (3). 

Administration of tests. During October and November of 1954 
the initial reading test and the mental maturity test were adminis- 
tered in fifty-six classes to one thousand three hundred twenty- 
five pupils. Selected portions of Form AA of the California Read- 
ing Test were administered to about one-half of each class and 
corresponding portions of Forms BB, CC, or DD were adminis- 
tered to the remainder. Form § of the California Test of Mental 
Maturity, Non-Language Section, was administered after a short 
rest period. In every instance the tests were given by trained psy- 
chometrists who had one or more assistants in the classroom. Pu- 
pils marked their responses directly in the test booklets. 

During April and May of 1955 the same parts of the equivalent 
forms of the reading test were administered in forty-nine of the 
same classes to nine hundred four of the same pupils. The final 
test, given after a lapse of approximately six months, was adminis- 
tered by the examiner who gave the initial test, and the sequence 
of the initial testing was preserved as closely as possible. 


ANALYSIS OF RESULTS 


Differences in final reading score means. An analysis of variance 
of the final reading scores (that is, number of items passed) of the 
eight hundred twenty-six pupils in the forty-five classes about 
which complete data were available is shown in Table II. The mean 
squares shown there may be interpreted to mean that the differ- 
ence between two pupils in the same class is, on the average, equal 
to the square root of one hundred nine or a little more than ten 


TaBLE II.—ANALYSIS OF VARIANCE OF FINAL READING ScorREs 
oF 826 Pupits 1n 45 CLASSES 











Source D.F. Mean Square PF 
Between classes 44 2361. 598 21.71** 
Within classes 781 108.788 

Total 825 














** Significant at the 0.01 level. 
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TaBLE III.—ANALyYsiIs OF VARIANCE OF FINAL READING ScoRES 
oF 826 Puprus 1n 45 CLAsses witH INITIAL READING ScoRES 
Heup CoNnsTANT 











Source ‘ D.F. Mean Square F 
Between classes 44 376.384 7.54** 
Within classes 780 49.934 

Total 824 














** Significant at the 0.01 level. 


points, while the difference between two pupils in different classes 
is, on the average, about forty-nine points. 

The fact that the average difference between pupils in different 
classes is more than four times as great as that between pupils in 
the same class suggests that there may be significant differences 
among classes. That there are significant differences is indicated 
by the F-ratio of 21.7, which has a probability of less than 0.01. 
These differences are probably due in large part to the fact that 
the classes are not all at the same grade level. One would expect 
most sixth graders to read better than most third graders. Whether 
or not these differences may be attributed in part to differences 
in the quality of the teaching exhibited in the different classes 
cannot be inferred from this analysis. 

Mean differences adjusted for initial reading scores. When the 
mean squares are “adjusted” to allow for differences in»initial 
scores—when, that is, the departure of each pupil’s score from the 
score predicted for him on the basis of his initial score is used in- 
stead of its departure from the mean of all the final scores—the 
mean square between classes is reduced from two thousand three 
hundred sixty-two to three hundred seventy-six and the mean 
square within classes from one hundred nine to fifty. These results 
are shown in Table III.? 

The average difference between pupils in the same class is still 
Seven points; indicating that even when initial status is allowed 
for, there are considerable differences in the effectiveness of a 
teacher with different pupils. The average difference between pu- 
pils in different classes is now about nineteen points. This means 
that, even if all of the pupils in all of the classes had started out 





*Computational procedures used are given in detail by Johnson (2). 
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TaBLE IV.—ANALYSIS OF VARIANCE OF FINAL READING SCORES OF 
826 Pupits In 45 CLASSES WITH INITIAL READING AND MENTAL 
Maturity Scores HeEtp CONSTANT 











Source DF. Mean Square F 
Between classes 44 386.400 7.97** 
Within classes 779 48.496 

Total 823 














** Significant at the 0.01 level. 


with the same reading score, the difference between pupils with 
different teachers would have been greater than that between pu- 
pils with the same teacher. The F-ratio of 7.54, significant at the 
one per cent level, justifies this conclusion. 

Adjustment for mental maturity. It is still doubtful whether 
these differences represent differences in teachers’ skill, since the 
average learning ability (in the sense of mental maturity) of 
classes might differ. A second “adjustment” was therefore made for 
mental maturity, with the results shown in Table IV. 

Here the average difference between classes increases slightly 
while the average difference within classes decreases slightly, in- 
creasing the F-ratio to 7.97. Most of the differences among pupils 
in mental maturity seem to have been reflected in differences in 
initial reading score, and their effect was removed in the first ad- 
justment. 

At this stage it seems justifiable to conclude that there are dif- 
ferences in average effectiveness among these forty-five teachers. 
Allowing for differences among pupils with respect to initial read- 
ing scores and mental maturity scores, there remains an average 
difference among pupils with the same teacher in final reading 
scores of 6.9 points, while the average difference between pupils 
with different teachers is 19.7. A given teacher is not equally ef- 
fective with all pupils, but some teachers are more effective on the 
average than others. 

Estimation of reading growth indices. The next step in the anal- 
ysis was the estimation of the adjusted mean final reading test 
score of each class—the reading growth index.’ Similar indices 
were computed for the four classes not used in the analysis. For 





* The method of estimation used is that described by Cochran and Cox (/). 
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comparison, mean final reading test scores adjusted for initial 
reading scores only, and unadjusted mean final reading scores were 
also computed. 

Table V shows the class mean final reading scores, unadjusted, 
adjusted for in{tial score only, and adjusted for both initial score 
and mental maturity score. The grade levels and class sizes are 
also shown. The first, third and fifth columns show the ranks of 
the classes on each of the three sets of means. 

Effects of covariance adjustments on rank order. The rank order 
of the classes changes considerably with the first adjustment. The 
class which ranks nineteenth in unadjusted final scores ranks third 
when adjusted for initial scores; the second highest score moves to 
the thirty-first place when adjusted. The power of the covariance 
method is now apparent. In addition to eliminating much varia- 
tion not related to teacher effectiveness, reducing the average in- 
terclass difference from forty-nine to nineteen points, covariance 
adjustment also rearranges the rank order of the classes. When the 
initial scores are allowed for, the class which has the second- 
highest final score is seen to have shown less improvement over 
initial standing than thirty other classes. 

The ranks change relatively little with the second adjustment— 
that for mental maturity—although the class originally second 
moves down another three places to the thirty-fourth rank when 
mental maturity is allowed for. This third set of ranks represents 
the nearest approximation to the correct rank order of these forty- 
nine teachers in ability to promote reading growth that can be in- 
ferred from these data. 

Study of School Differences and Grade Differences. Two more 
factors which may affect the differences among classes were 
studied—school differences and grade differences. Table VI shows 
the results of an analysis of variance of reading growth indices 
into components between and within schools. The average differ- 
ence between classes in different schools is 5.5 points while the 
average difference between classes in the same school is 4.8; al- 
though the former is somewhat larger than the latter, the difference 
is not significant. It may, therefore, be concluded that differences 
among schools in the amount of pupils’ reading growth were not 
detected in this study. 

Table VII shows an analysis of variance of reading growth in- 
dices into components between and within grades. Pupils in dif- 
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TaBLE V.—CoMPARISON OF UNADJUSTED, SINGLY ADJUSTED AND 
Dovusiy ApJusTEep FINAL Reapina Scorn Mgans or 49 
ELEMENTARY ScHOOL CLASSES 














,; : Final Reading Score| Final Reading Score 
Pim(unadjusted) | Adjusted for Initial | “S°ore nd Mental Grade | Number 
Maturity ° of 
Level Pupils 
Class and Rank | Mean | Rank | Mean | Rank | Mean 

1 74.12 2 58.56 2 58.40 5 25 

2 69.43 31 47.71 34 47.24 6 21 

3 65.52 15 51.10 16 51.14 5 21 

4 65.36 6 56.79 6 56.86 5 14 

5 65.14 1 60.43 1 60.43 4 7 

6 64.60 18 50.67 17 50.80 4 15 

7 63.75 24 49.32 26 49.00 4 28 

8 63.19 5 56.88 7 56.69 5 16 

9 58.70 17 50.95 20 50.50 5 20 
10 57.57 10 54.05 12 53.57 3 21 
11 57.50 8 55.92 8 56.00 3 24 
12 56.65 21 50.12 25 49.29 6 17 
13 56.40 19 50.40 19 50.53 4 15 
14 56.14 20 50.29 23 49.48 6 21 
15 54.77 4 57.23 3 58.08 6 13 
16 54.07 30 47.93 29 47.78 4 27 
17 53.72 9 54.41 10 54.45 3 29 
18 52.95 22 50.05 21 50.05 4 19 
19 52.75 3 58.08 4 57.71 5 24 
20 52.03 32 47.48 32 47.34 3 29 
21 51.66 11 53.91 11 53.97 3 32 
22 50.96 7 56.24 5 57.04 5 25 
23 50.21 14 52.38 14 52.42 3 24 
24 49.92 28 48.31 31 47.62 5 13 
25 48.29 29 47.94 27 48.12 3 17 
26 47.67 13 53.42 13 53.08 4 12 
27 45.56 16 51.06 18 50.75 3 16 
28 44.13 27 48.50 28 47.96 4 24 
29 43.06 12 53.50 9 54.75 3 16 
30 43.00 34 47.32 | 34% 47.05 3 19 
31 42.05 33 47.40 | 34% | 47.05 3 20 
32 41.33 39 44.67 42 44.00 4 4) 
33 40.94 25 49.22 22 49.50 3 18 
34 40.86 26 49.00 24 49.43 3 14 
35 39.68 23 49.89 15 51.37 3 19 















































Pupil Growth in Reading 237 
TaBLE V.—Continued 
: : Final Reading Score 
inal ing S Final R Score | ‘Adjusted for Initial 
a Aapases Se tial [ ah Pomme Grad Number 
, ue Maturity Level of 

Pupils 

Class and Rank Mean Rank Mean Rank Mean 
36 39.50 36 46.86 36 46.36 5 14 
37 39.29 46 41.59 47 41.47 4 17 
38 37.59 35 47.18 30 47.76 3 17 
39 35.50 45 42.09 46 41.82 4 22 
40 35.11 40 44.47 40 44.79 4 19 
41 35.09 41 44.26 41 44.65 3 23 
42 33.38 42 44.00 43 43.88 4 8 
43 31.13 44 42.12 45 42.25 3 16 
44 30.11 43 43.22 38 45.44 3 9 
45 30.00 37 44.88 37 45.62 3 8 
46 29.86 38 44.82 39 45.41 3 22 
47 27.42 47 41.42*| 44 42.46 3 24 
48 26.85 49 40.08 48 41.38 3 13 
49 23.38 48 40.50 49 41.00 3 8 
Weighted 48.75 48.75 48.75 | Total 904 

Grand Mean 
Range 50.74 20.35 19.43 


























* The mean of 41.42 was calculated from a total, a portion of which was 
derived by regression methods. 


ferent grades show a difference on the average of 8.5 points. 
Whether or not this difference is significantly greater than that 
between pupils in the same class remains in doubt because the 
F-ratio has a probability between 0.01 and 0.05. 

Reliability Estimate. The final step in the covariance analysis 
is to estimate the reliability of the reading growth indices. The re- 


TaBLeE VI.—ANALYsSIS OF VARIANCE OF 49 TwicE-ADJUSTED CLASS 


Means To Trest ror ScHoot DIFFERENCES 
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Source D. F. Mean Square F 
Between schools -18 30.461 1.34 
Within schools 30 22.778 
Total 48 
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TaBLE VII.—ANALYSIS OF VARIANCE OF 49 TwicE-ADJUSTED 
Crass Mgans To Test ror GRADE EFFECTS 











Source DF. Mean Square F 
Between grade levels 3 71.884 3.18* 
Within grade levels 45 22.578 
Total 48 














* Significant at the 0.05 level. 


liability of a test in any group is the ratio of the variance of the 
true scores to the variance of the obtained scores of that group. 
In this instance, the variance of the obtained scores is estimated 
by the mean square between classes in Table IV. The error vari- 
ance is estimated by the mean square within classes. If the error 
variance is subtracted from the obtained score variance an esti- 
mate of the true score variance is obtained; the reliability may 
therefore be estimated to be: 


386.400 — 48.496 
r= 336 400 = 0.874 

This coefficient may be interpreted as representing the expected 
correlation between the mean reading growth indices of the forty- 
five classes studied and the mean reading growth indices that 
would be obtained by forty-five other classes taught by the same 
teachers. Its magnitude indicates that had these teachers been 
assigned to other, similar classes, their rank order would not have 
differed greatly from that shown in Table V. 





SUMMARY AND DISCUSSION 


A follow-up of one thousand six hundred twenty-eight students 
in New York City municipal colleges located seventy-five of them 
beginning their teaching careers in grades three, four, five, or six 
in City public elementary schools which were employing at least 
two such teachers. Of these teachers, nineteen were eliminated and 
the classes of the other fifty-six were tested with a reading test 
and a mental maturity test in the fall of 1954 and forty-nine of 
them were tested with equivalent forms of the reading test the 
following spring. 

The analysis of variance and covariance was used to estimate 
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the average effectiveness of each teacher in stimulating her pupils 
to learn to read, taking into account: 

(a) Differences in learning aptitude among pupils. 

(b) Differences in previous reading achievements among pupils. 

(c) Differences in grade level. 

(d) Differences in average amounts of improvement in reading 

ability among pupils in different schools. 

The classes of these forty-nine beginning teachers were found 
to differ widely in the average amount of improvement in reading 
ability over one year, even when allowance was made for differ- 
ences in aptitude and amounts learned in previous years. These 
differences were no greater between classes in different schools 
than between classes in the same school. In general, there was 
some evidence that beginning teachers’ effectiveness in the area of 
reading varies for different grades, but the evidence was not con- 
clusive. The reliability of the differences in average reading im- 
provement among classes was such that if the study had been re- 
peated with the same teachers in other classes, the mean growth 
scores of the classes would have correlated 0.87. 

It was therefore concluded that there are substantial differences 
among beginning teachers in New York City public schools in ef- 
fectiveness in stimulating pupils to learn to read, and that these 
differences cannot be attributed wholly to differences among 
schools nor to differences in pupils’ learning ability or previous 
achievement. 
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A FURTHER INVESTIGATION OF THE 
RELATIONSHIP BETWEEN ANXIETY 
AND CLASSROOM EXAMINATION 
PERFORMANCE 


ALLEN D. CALVIN, F. J. McGUIGAN, and MAURICE W. SULLIVAN 


Hollins College 


In a recent paper McKeachie, Pollie, and Speisman (3) found 
that students who were given the opportunity to write comments 
about their objective examination questions performed better on 
the last half of their examination than students who were not 
given an opportunity to comment. They pointed out that this 
finding “... fitted in with our theory that tension is built up 
throughout the test and that giving opportunity to comment re- 
duces the increasing tension.” (3, p. 95). They also reported a re- 
lationship between examination scores and McClelland’s “need 
for achievement” (2) personality variable. O1 the basis of their 
results they concluded: “Anxiety inhibits performance. Giving 
students an oportunity to write comments aids not only in re- 
ducing the threat but also in channeling the release of anxiety” 
(3, p. 98). 

Since McKeachie et al., themselves, noted after their first ex- 
periment that “our results seemed too good to be true”, an attempt 
to replicate their findings seems called for. In addition, the fact 
that they had no direct measure of anxiety indicates the need for an 
experiment where such a measure is provided. The following study 
is therefore an attempt to replicate McKeachie et al.’s results, 
and to test their “anxiety reduction” hypothesis. 


METHOD 


Subjects. The Ss, one hundred and fifty-two undergraduate fe- 
male students from Hollins College, were taken from five classes 
as follows: sixty-one and thirty-five from two introductory psy- 
chology classes, seventeen and twenty-one from two introductory 
Spanish classes, and eighteen from a Spanish literature class. In 
each case the entire class participated. 

Procedure. In an attempt to distinguish between high anxious 
and low anxious Ss, the A-Scale with the biographical inventory 
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as described by Taylor (6) was given. The Otis Higher Examina- 
tion of Mental Abilities (5) was also given. Both tests were ad- 
ministered at the beginning of each course. The rest of the pro- 
cedure was similar to that used by McKeachie et al., who modified 
their methodology slightly from group to group. The specific pro- 
cedure used in our experiment was as follows: The examination 
consisted of randomly presented, multiple choice items and was 
the first examination given in the course for each group. Each 
class was divided at random into an experimental and a control 
group. The experimental group in each class received answer 
sheets with the instructions: “Put an X through the best answer 
for each item. Feel free to make any comments about the items 
in the space provided,” and the control group received answer 
sheets with the instructions: “Put an X through the best answer 
for each item. Do not mark in the space to the right of your 
answers.” 


RESULTS?! 


In all five classes, the experimental group made fewer errors 
on the last half of the examination than did the control group. 
Tests between experimental and control groups in each class, how- 
ever, failed to reach the 5% point of significance. An F test between 
classes was made and no significant difference obtained, so the 
findings for all the classes were combined and the one-tailed bino- 
mial expansion indicated that the present results could be ex- 
pected by chance approximately three times out of one hundred. 
Accordingly, the null hypothesis that there was no difference in 
performance between experimental and control groups on the last 
half of the examination was rejected. 

The largest introductory psychology class permitted us to make 
an analysis in terms of the A-Scale. The sixty-one Ss were divided 
at the median and classified as low and high anxiety groups. The 
high anxiety Ss in the experimental group were significantly worse 
(p < 0.04, one-tailed ¢ test) on the first half of the examination, but 
did not differ significantly from the low anxiety Ss on the second 
half (See Table I). An evaluation, in terms of gain from the first 





* Data gathered in connection with another study made it possible to test 
McKeachie et al.’s “anxiety reduction” hypothesis on a limited number of 
Ss using another measure of “anxiety” [the Palmer perspiration index de- 
scribed by Mowrer, (4)]. This analysis yielded non-significant results. 
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TaBLE I.—ERrRorRsS ON First AND Last Hatves or EXAMINATION 














Group First Half Second Half 
Experimental 
High anxiety 9.58 7.25 
Low anxiety 7.24 7.35 
Control 
High anxiety 9.00 8.82 
Low anxiety 8.43 6.78 





half of the examination to the second, showed that the high anxious 
experimental Ss were significantly superior to the low anxious 
experimental Ss (p < 0.005). When a covariance to remove the 
effect of initial scores on gain scores was run the resulting F was 
still significant (p < 0.02). This difference in gain scores is in- 
dicated in Table I by the sharp reduction in errors for the high 
anxiety experimental group on the second half of the test, and by 
the slight increase in errors for the low anxiety experimental group. 
For the control group, the high anxiety Ss made more errors than 
the low anxiety Ss (although not significantly more) on both the 
first and second half of the test; while the high anxiety Ss did 
improve somewhat on the second half of the test the low anxiety 
Ss improved even more.” 

The correlation between total number of comments and number 
of errors on the second half of the examination was —0.19 for the 
high anxiety experimental group, and —0.40 for the low anxious 
experimental group, neither of which was significant. 

The correlation between Otis scores and anxiety scores was 
—0.05. 


DISCUSSION 


The finding that all five experimental groups were superior to 
their corresponding controls corroborates the findings of Mc- 





*In a personal communication from W. F. Smith, we have learned that 
he and F. C. Rockett, did a related study using the Sarason anxiety scale. 
They did not analyze their results to see if high anxiety Ss showed a greater 
gain than low anxiety Ss on the second half of the examination, but they 
varied type of instruction and although the effect of type of instruction was 
not significant, they found a significant interaction between type of instruc- 
tion and anxiety level. High anxiety Ss benefited more from anxiety re- 
ducing instruction. 
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Keachie et al. The failure of the individual intra-class comparisons 
to reach acceptable levels of statistical significance may well be 
due to the smaller Ns in our classes than in McKeachie et al. 

The sharp, drop in errors for our high anxiety experimental 
group on the second half of the examination is similar to the be- 
havior of McKeachie et al.’s low need for achievement group and 
supports their anxiety reduction hypothesis. 

McKeachie et al. report a correlation for their low need for 
achievement group of 0.73 between scores on the last half of the 
examination and number of comments, and for their high need for 
achievement group of 0.05. Our correlations, between number of 
errors on the last half of the examination and number of comments, 
of course, reverses the sign so that in both McKeachie et al.’s Ss, 
and ours, those Ss who made the most comments performed the 
best. Neither of our correlations, however, approached the magni- 
tude of McKeachie et al.’s low need achievement group, and our 
low A-Secale group actually had a higher correlation than our 
high A-Seale group, although neither of our correlations was sig- 
nificant. Rather than speculate on these results at this time, it 
seems wiser simply to note that there is a trend for those Ss who 
make more comments to make fewer errors on the last half of 
their examinations. 

The practically zero correlation between Otis scores and anxiety 
scores certainly does not point toward any relationship between 
anxiety and intelligence in our sample, although we cannot of 
course definitely rule out the possibility that there may exist dif- 
ferences in intellectual makeup between our high and low anxiety 
Ss which are not shown by the Otis (1, 7). 

The present results, taken together with those of McKeachie 
et al., indicate the applicability of “drive reduction learning the- 
ory” to the classroom situation. 


SUMMARY 


In an effort to confirm findings by McKeachie et al., five under- 
graduate classes were divided into experimental and control groups. 
The experimental groups were allowed to comment on items in 
an objective examination while the control groups were not. When 
the results of the five classes are combined, the experimental Ss 
performed significantly better than the controls on the second half 
of the examination. The largest class with an N of sixty-one was 
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divided into high and low anxious Ss on the basis of their A-Scale 
scores. Anxiety was found to be a significant variable in terms of 
improvement from the first half of the test to the second in the 
experimental group. It was also found that those Ss who made 
the most comments showed the greatest improvement, but this 
relationship was not significant. Implications of these findings are 


discussed. 
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SEARCHING ORIENTATION AND 
CONCEPT LEARNING' 


GABRIEL M. DELLA-PIANA 


Public School District No. 111, Highwood, Illinois 


Results of research during the first three decades of the twenti- 
eth century established that learning may be enhanced by giving 
the learner “knowledge of results.” MecGeoch and Irion (8, pp. 
265-268), in summarizing much of this research, conclude that 
knowledge of results has its greatest learning value when obtained 
with optimum immediacy, specificity, and frequency. Theoretical 
interpretations of these conclusions (1, 2, 9, 11), suggest three 
processes by which knowledge of results may influence learning: 
(a) showing progress and thus “motivating” the learner. (b) pre- 
senting a standard and thus “guiding” the learner’s trial responses. 
(c) indicating errors and allowing the learner to find out why 
the response was wrong, thereby “activating a searching orienta- 
tion” in the learner. 

The present study assumes that a “searching orientation” may 
be induced by manipulating the method by which the subject ob- 
tains “informative feedback.” On the basis of this assumption, 
predictions are made concerning how variations in amount of 
searching behavior may influence the learning of concepts. “In- 
formative feedback procedures” is a brief term used in this paper 
in lieu of the longer phrase “procedures for giving a learner knowl- 
edge of which responses are incorrect.” “Searching orientation” 
refers to the subject’s hypothesizing as to why a response was not 
correct. The hypotheses thus developed by the subject are tested 
by him each time he receives knowledge of results. Presumably, 
this process leads to the learning, not only of correct responses, 
but also of reasons underlying the correct responses. Thus, the 





*A more complete presentation of the hypotheses, data, and procedures 
of this study is available in G. M. Della-Piana’s “Two Experimental Feed- 
back Procedures: A Comparison of their Effects on the Learning of Con- 
cepts,” unpublished Ph.D. dissertation, 1956, on file in the University of 
Illinois Library. Also available as microfilm, Publication No. 16,386, Uni- 
versity Microfilms, 313 No. First St., Ann Arbor, Michigan, $1.10. The 
study was done under the difection of Professors Lee J. Cronbach and J. T. 
Hastings. 
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searching orientation was presumed to act as an intervening vari- 
able through which knowledge of results might influence learning. 
Underwood (11, pp. 414-417) mentions this “searching orienta- 
tion” (he calls it “attempt to discover”) as a variable probably 
accounting for the learning value of knowledge of results. Jones 
(6) and McConnell (7) conducted studies of classroom teaching 
procedures in which a “searching orientation” seemed to be in- 
duced by feedback procedures and appeared to facilitate “mean- 
ingful” learning. 


METHOD 


Two experimental treatments. Two informative feedback pro- 
cedures were developed to produce variations in “searching orien- 
tation” of subjects. These procedures constituted our two experi- 
mental treatments: one (dependency treatment) in which a subject 
was immediately told the correct answer when he gave an incorrect 
response, and the other (searching treatment) in which a subject 
was told to keep trying until he “discovered” the correct answer. 
Our searching treatment is actually the standard “correction pro- 
cedure” used in experimental learning studies (10, p. 13). In the 
present study it is called the “searching treatment” because it was 
assumed to induce the subject to develop hypotheses as to why 
his responses were wrong. The dependency treatment was devel- 
oped to produce a lesser degree of searching behavior than that 
produced by the searching treatment. We attempted to attain this 
aim by varying the feedback procedure so as to force the subject 
to make a response antagonistic to searching. The response we 
attempted to induce by the dependency treatment was the rela- 
tively non-inquiring one of “looking to the experimenter for the 
answers.” Differences between treatments, other than the theoreti- 
cally crucial difference, were minimized. For example, “time on 
item” differences were minimized; however, the degree of “search- 
ing” engaged in during that time was varied. We did not directly 
investigate the psychological process of “searching.” Instead, we 
investigated hypothesized differential effects which variations in 
“degree of searching” presumably had on the learning of concepts. 

Two ability levels. An attempt was made to utilize a two-by- 
two treatments by levels design. Guilford’s Figure Exclusion Test 
(3) was used as a basis for dividing subjects into ability levels 
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since this test measured factors related to those called for by our 
learning task. Of the eighty-two students tested, we selected the 
forty extremes on the Guilford test (twenty in each treatment). 
Unfortunately, almost half of the differences between ability level 
groups on the Figure Exclusion Test were lower than two times 
our standard error of the difference between scores. Thus, we had 
very little confidence in the differences between ability level groups 
on this test. 

No statistically significant differences between ability level 
groups were obtained in this study on any of the dependent vari- 
able measures. We may say that differences between levels, if they 
existed, were not strong enough to show up in our sample under 
the conditions noted above. Therefore, further data on “levels” 
will not be reported in this paper. 

The concept learning task. An inductive concept formation task 
developed by Heidbreder and Ivy (4) was used. We chose this 
task for two reasons: First, it may be learned either through 
“rote association” (naming the concepts without being able to 
identify or define their distinguishing characteristics) or through 
more “meaningful association” in which the learner can name and 
define the concept (4, p. 124). This was an essential characteristic 
of the task used in this study since we expected our two feedback 
procedures to differ with respect to the extent to which they facili- 
tated one or another of these two kinds of learning. Secondly, the 
task is similar in many respects to certain school learning tasks 
where drill procedures similar to our experimental feedback pro- 
cedures are utilized; e.g., in the learning of basic arithmetic facts 
or sight vocabulary. 

As used in this study, “concepts” refer to certain common at- 
tributes of a group of geometrical designs presented on cards. The 
particular concepts we utilized were two of color (e.g., “blueness”), 
two of shape (eg., “diamond-shapedness’”’), and two of number 
(e.g., “fourness”). Each of the six concepts was given a nonsense- 
syllable name, such as BLAG or LORB. 

The concepts were presented to the subject embedded in draw- 
ings on 3” by 5” cards. The subject was presented the drawings in 
ten separate series of six drawings each. Each of the six drawings in 
a series contained one of the six concepts. Each concept was repre- 
sented in each series; however, in addition to the constant com- 
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ponent of color, shape, or number, which defined the concept, it 
had a variable component from series to series. Thus, the concept 
“fourness” was always represented by a drawing of four geometric 
designs. However, in one series it was “four, pink, triangular de- 
signs”; in another series it was “four, orange, rectangular designs”; 
and so on; but it was always four designs. 

The subject was asked to learn the nonsense syllable names of 
designs presented to him singly and successively in series after 
series of cards. The conditions made it possible for the subject 
to discover that a given nonsense syllable, though used as the 
name of very different designs in the different series, was always 
applied to designs which, taken together, possessed a common and 
distinguishing characteristic. Thus, “BLAG” was always used as 
the name of a design of the concept “fourness” (the design having 
four objects) though the shape and color of the design varied from 
series to series. 


DATA AND SCORES 


The hypotheses of the study will have more meaning to the 
reader after a brief description of the data and scores has been 
digested. There are three sets of scores: learning series scores, 
post-treatment scores, and scores combining learning series and 
post-treatment measures. 

Learning series scores. In the learning series a concept was said 
to be “attained” when a subject named it correctly on the first 
trial (and all subsequent trials) of at least two consecutive series 
and when he never named it incorrectly after these two consecutive 
series. Thus, “concept attainment” refers to ability to give non- 
sense syllable names of concepts without regard to whether one 
knows the distinguishing feature of the concept. 

The raw data of the learning series consists of an error tally on 
a concept for each trial (within a series) in which the subject 
names the concept incorrectly. From these data we obtained a sum 
of series required for concept attainment and a sum of number 
of concepts attained. 

Two means, derived from these data, were used in testing the 
hypotheses of this study. 

(a) Mean number of concepts attained (NCA). This was ob- 
tained by summing the total number of concepts attained by each 
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person in a treatment and dividing this sum by the total number 
of persons in that treatment. 

(b) Mean number of series to attain concept (SCA). This was 
obtained by summing the total number of series required for at- 
tainment of a given concept by each person in a treatment and 
dividing this sum by the total number of persons in that treatment. 

Post-treatment scores. Following the learning series, each sub- 
ject was given two tests: first, an open-ended definitions test and 
then a multiple choice definitions test. These tests presented the 
subject with a list of the nonsense syllable names of the concepts 
and required the subject first to write a definition of the dis- 
tinguishing feature of the concept and then to select, from among 
several alternatives, the distinguishing feature of the concept. The 
test score was the number of concepts correctly defined. On the 
open-ended test this was called a “recall of meaning score” and on 
the multiple choice test it was called a “recognition of meaning”’ 
score. 

Two means, derived from these scores, were used in testing the 
hypotheses of this study. 

(a) Mean recall of meaning (RceM). This was obtained by 
summing the total number of meanings recalled by each person in 
a treatment and dividing this sum by the total number of persons 
in that treatment. 

(b) Mean recognition of meaning (RgM). This was obtained by 
summing the total number of meanings recognized by each person 
in a treatment and dividing this sum by the total number of per- 
sons in that treatment. 

Thus, the post-treatment measures allowed us to determine the 
mean number of concept definitions recalled and recognized by 
each treatment group. 

Combined learning series and post-treatment score. One mean 
was obtained by use of combined measures. 

(a) Mean meaningful-concept attainment (MCA). This was 
obtained by summing the percentage (decimal fraction) of at- 
tained concepts for which each person in a treatment could recall 
the correct meaning and dividing this sum by the total number of 
persons in that treatment. 

Thus, combining learning series and post-treatment measures 
allowed us to determine the mean percentage of “attained” con- 
cepts for which a treatment group could recall the definitions. 
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TaBLE I.—SumMary OF Tests OF SIGNIFICANCE OF DIFFERENCES 
BETWEEN TREATMENT MEANS ON FouR VARIABLES 
(d.f. = 1 and 36) (Treatment N = 20) 











Dependent Variable Measures 
Mean of /|Mean of concept/Mean of concept} Mean fraction 
concepts definitions definitions ofnamed con- 
named recalled recognized cepts recalled 
Searching treatment 4.40 4.75 5.10 .97 
Dependency treatment 3.75 3.60 4.05 81 
Difference .65 1.15 1.05 16 
Mean square* 4,22 13.22 11.03 24 
F ratiof 1.84 8.58f 9.59f 12f 

















* Between treatments. 
T Using within treatments mean square. 
t Significant beyond 0.01 level. 


TESTS OF HYPOTHESES 


Results of the major hypotheses of the study are presented in 
Table I. Although the study was set up with a two-treatments-by- 
two-levels design no data on levels or treatment-levels interactions 
are presented here because of non-significance of “levels” differ- 
ences on both ability and dependent variable measures. 

Hypothesis I. The mean number of concepts attained (NCA) 
will be greater for subjects in the “searching” treatment than it 
will be for subjects in the “dependency” treatment. 

As Table I shows, treatment differences were in the predicted 
direction, but were not statistically significant. Incidentally, they 
were significiant for low ability subjects taken alone. 

Hypothesis II. The mean number of series required for con- 
cept attainment (SCA) will be less for subjects in the “searching” 
treatment than for subjects in the “dependency” treatment. 

A sign test for the significance of the direction of differences 
showed that only seven out of twelve differences between treat- 
ments were in the predicted direction. This gives us a z-score” 
of only 0.058, definitely not statistically significant. 

Hypothesis III. The mean number of concepts for which one 
can recall the meaning (RceM) will be greater for “searching” 
treatment subjects than for “dependency” treatment subjects. 

Hypothesis IV. The mean number of concepts for which one 





(raw score — mean)/standard deviation. 
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can recognize the meaning (RgM) will be greater for “searching” 
treatment subjects than for “dependency” treatment subjects. 

Hypothesis V. Of all the concepts attained by subjects, the 
mean percentage of these concepts for which the subjects can 
‘recall the meaning (definition) will be greater for “searching” 
treatment subjects than for “dependency” treatment subjects. 

As Table I indicates, concerning Hypotheses III, IV, and V, 
in all cases the overall treatment differences are statistically 
significant in the hypothesized direction. 


DISCUSSION 


Our interpretation of the above results may be summed up as 
follows: Within the restrictions of the procedures of this study, 
it is shown that subjects who are allowed to search for a correct 
answer (S-group) compare with subjects who are told the correct 
answer (D-group) in the following ways: 

(a) No significant differences in number of concepts named 
correctly. 

(b) No significant differences in number of series presenta- 
tions required to name concepts correctly. 

(c) S-group learns to recall and recognize significantly more 
definitions of concepts than D-group. 

(d) S-group (more than D-group) learns to recall definitions 
of a greater percentage of concepts they learned to name correctly. 

There are certain qualifications and suggestions for further 
study which may be noted at this point.. 

A very important suggestion arises as a consequence of the 
positive results of Hypotheses III, IV, and V. The rationale for 
these hypotheses was that the “searching” treatment, more than 
the “dependency” treatment, induced a set for “formulating and 
testing hypotheses” and that this hypothesizing and testing ac- 
tivity was the main factor in accounting for treatment differences 
in meaningfulness of learning. Now that significant differences 
have been found, it is advisable to investigate further along two 
lines suggested below. 

First, an attempt should be made to get at the differences in 
problem solving processes of subjects in each of the treatments. 
Secondly, an attempt should be made to get information on dif- 
ferences in incentive factors between treatments. That is, now 
that we have found genuine differences between treatments on 
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our dependent variable measures, we want to know more specifi- 
cally how these treatments influence iearning. Thus, the question 
remains open and testable: To what extent are observed differ- 
ences between treatments due to “hypothesizing and testing ac- 
tivity” and to what extent are they due to “differences in motiva- 
tional characteristics’’? 

Further research on Hypotheses I and II is also suggested by 
two observations. First, an association interference variable was 
not equated between treatments and operated against the pre- 
diction. Only one incorrect association could be made by de- 
pendency treatment subjects before they were given the correct 
name of a concept card. It was possible for “searching” treat- 
ment subjects to make as many as five incorrect associations 
with a given card. For experimental purposes, it would be pos- 
sible and desirable to equate this variable between treatments. 
Secondly, limiting ourselves to only ten series of cards seriously 
restricted the range of concept attainment scores for our sample 
of subjects. What effect more trials, and greater discrimination, 
would have on our results is a matter for further experimentation. 


SUMMARY 


Two informative feedback procedures were developed to pro- 
duce hypothesized variations in searching orientation of subjects. 
Searching orientation was not directly observed; however, pre- 
dictions were made concerning how variations in searching orien- 
tation would effect performance in naming and defining concepts. 
As predicted, the group assumed to do more searching learned to 
give significantly more definitions of concepts. This major posi- 
tive result suggests the value of more direct study of the hy- 
pothesized psychological processes presumably induced by the 
feedback procedures. The hypotheses that the searching group 
would learn to name more concepts in fewer trials were not sup- 
ported. 
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